Henry S. Thompson HCRC Language Technology Group
Division of Informatics
University of Edinburgh
© 2003 Henry S. Thompson
Why a style language?
Style control for HTML
Two approaches to style for XML:
When you see this, it means there’s accompanying information in the Additional Notes
2
Separating form from content
Separating structure from appearance
Single source, multiple delivery media
Document Compilers: ASCII text with formatting instructions and body text intermixed
WYSIWYG Word Processors: Out-of-band formatting instructions change appearance on-screen; proprietary file formats.
(Semi-)Structured Markup: Markup has either intrinsic or extrinsic rendering consequences.
The old document compilers
The WYSIWYG systems
SGML solved the proprietary format problem
But for a long time there was no standard way of formatting SGML documents for printing or viewing
So HTML (nearly/post-hoc an SGML application), by mandating a rendering semantics for all its semi-structural markup, filled a real need.
But it was
9
Style standard for SGML?
Customise HTML page appearance?
Extend HTML tag-set and control style?
Style for XML, London 1998-11-25
Technology Appraisals
Henry S. Thompson
Level 1 Accepted Recommendation per W3C, December 1996
Level 2 Accepted Recommendation, May 1998
Addresses the problems of:
Initially driven by the need for site designers to differentiate the appearance of their pages from one another
Focus accordingly is on controlling the colour, size and shape of regions and fonts
HTML example:
<HTML>
<HEAD><TITLE>Example file</TITLE></HEAD>
<BODY>
<H2>Example text</H2>
<P>Here is some text. It's a paragraph of text in fact. But with very little content. Pretty boring if you ask me.
<P>
And some more text.
Again with very little content. All marked up
in vanilla HTML.
</BODY></HTML>
You can change the way HTML tags are rendered:
CSS style rules associate properties with elements in your documents which match selectors
The basic structure of a rule looks like this:
selector[, selector ...] {pname: pvalue[; pname: pvalue ...]}
Simple examples:
verbatim {white-space: pre}
H1 {text-align: center; font-variant: small-caps}
The first would provide style for an XML doc't
The second would change HTML's H1
5
Customising HTML
Formatting XML
Contents of STYLE element in the HTML header
Destination of an appropriate LINK element
In STYLE attributes on any HTML element
5
6
Rules can have one or more selectors, separated with commas
Simple names select elements by name
In addition to element type names, other selector syntax includes
Sometimes you need context-sensitive selectors
For depth-sensitive rendering
OL {list-style-type: lower-alpha}
OL OL {list-style-type: lower-roman}
For context-appropriate rendering
H1 {font-weight: bold;font-size: large}
H2 {font-weight: bold;font-style: italic}
H3 {font-style: italic}
H2 EM,H3 EM {font-style: normal}
Note that in the last rule we have two selectors, separated by commas, sharing the same result
CSS uses a nested-boxes rendering model, and every block element is rendered into a box
Boxes all have margins, borders and padding (outside in)
All four margins and paddings (left-,right-, top-, bottom-) have width properties, and a shorthand property for setting them all together
Borders, in addition to widths, have colours and styles, plus shorthand properties for various combinations
There are also float and clear properties to allow a modest amount of displacement and flow-around.
CSS2 goes a lot further with this
P { margin: 3ex; border-width: thin;
border-style: solid;
border-left: double;
text-align: justify;
border-color: blue; padding: 2ex 4ex}
gives the following for a sample paragraph
Some are symbolic, e.g. font-style: italic
URLs appear in a few places, e.g. background-image: url(http://www...)
Most are
What happens when there is more than one rule which provides a value for a property on a given element?
The highest priority value assignment wins
When no assignment is found, the value is either inherited or defaulted
This explains why our original H1 example was bold
A number of things contribute to determining priority
The following are in increasing order of priority
LI
UL LI
UL OL LI
LI.special
OL LI.special
#hotone
In principle, it's easy
In practice
Style sheet linkage is via a PI
8
An ISO standard (ISO 10179:1996)
A style language
A transformation language
A hopeless acronym
A lost opportunity!
Main properties
Portable standard style specification
Single source documents, multiple delivery media
Multiple document types, single house style
Just as much complexity as you need
Controlling filling and line breaking:
Page or line fidelity:
Carefully crafted page layout:
User interaction:
CSS takes document tree and decorates it with formatting properties
XSL takes a document tree and builds a new document tree which it then decorates
XSLT style sheet: template rules
XSLT processor
From XML to XML
Three places it can happen
Modular
Localised
Scoped
Unbiased
CSS takes a document tree and decorates it with formatting properties
XSLT takes a (source) document tree and builds a new (result) document tree
If the result tree's vocabulary defines appearance, then XSLT can be a style language
No parentheses!
XSLT is notated with XML element types
DSSSL semantics without DSSSL syntax
The main component of an XSL stylesheet is the template rule
Each template rule contains
Restriction on
match context
The el't type
to match
<xsl:template match='div/title'>
<fo:block font-weight='bold'>
<xsl:apply-templates/>
</fo:block>
</xsl:template>
Pattern
The for- matting object
to be created
The content of the formatting object:
use the subordinate results
div
Block [f-w: bold]
title
. . .
T
h
e
s
T h e s . . .
We could try translate our example into CSS as follows:
div title { font-weight: bold }
But that would actually be wrong:
XSL does not require a one-to-one relation between source and destination
XSL can restrict matches based on
These are expressed in the form of path expressions, which are shared with the draft XPointer proposal
The common part is called XPath
/ for (root's) children
// for (root's) descendants
.. for parent
name for matching elements
@name for matching attributes
[. . .] for conditions
12
With all these pattern variants, what happens if two rules match?
Drawing on both DSSSL and CSS, there are a set of precedence rules
Basically, the richer the pattern, the higher precedence
If all else fails, there is a numeric priority attribute
The 'action' part of a rule isn't much like an action at all
It's more like a picture of what you want in the way of formatting objects
Nesting is specified directly
So you can build up quite detailed formatting object structures
The special xsl:apply-templates element type determines where the formatting objects resulting from processing the children of the matched node should be plugged in
This 'action' builds a rich result structure <p> <span style='font-size: 150%'> <xsl:value-of select='@name'/> <xsl:text>. . . .</xsl:text> </span> <em> <xsl:apply-templates/> </em> </p>
<x:templ match='demo'> <HTML> <BODY> <x:apply-templates/> </BODY> </HTML>
<x:templ match='para'> <P> <x:apply-templates/> </P>
HTML
BODY
demo
P
P
para
para
. . .
. . .
T
h
e
f
T
h
e
s
T h e f . . .
T h e s . . .
Add a <style> element to the template for the root
You may not always want to just invoke processing on a node's children in the ordinary way
You can supply a select attribute on xsl:apply-templates to specify what you want processed
If all you want is the text content of an element or attribute as such
<xsl:template match='/'> <HTML> <HEAD> <TITLE> <xsl:value-of select='doc/title'/> </TITLE> </HEAD> <BODY> <xsl:apply-templates/> </BODY> </HTML> </xsl:template>
14
You may not even want material to appear in the output in the same order it appears in the source, e.g. if the source was derived from a database
select can be used to reorder by pulling out first one child type, then another, etc.
<xsl:apply-templates select='a'/> <xsl:apply-templates select='b'/>
All a's will end up before all b's, regardless of where they started
xsl:sort provides more detailed control
XSL has two default rules, similar to DSSSL's
15
XML to XML can be very useful
Sophisticated applications can be built by combining multiple XSLT-implemented transformations
The core of every serious transformation
<xsl:template match="@*|*|comment()|processing- instruction()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template>
XSLT is a pure functional language
You can bind variables
<xsl:variable name="currencySymbol"> £ </xsl:variable> <xsl:variable name="title" select="/catalog/title"/>
And access them
<xsl:value-of select="$currencySymbol"/>
The document function allows access within a stylesheet to named other documents
If bound to a variable, can then be used as the starting point for a search
<xsl:variable name="catalog" select="document('exa15.xml')/*"/> . . . <xsl:value-of select= "$catalog/entry[number='E102']/price"/>
James Clark has implemented most of XSLT
IE5+ with the MSXML4 product (http://msdn.microsoft.com/xml/) supports the whole language
Others are implementing subsets of the formatting semantics
Usage is increasing very rapidly
Three places it can happen